Prototype batching lock intent for skip locked by emilienoel · Pull Request #31344 · yugabyte/yugabyte-db

emilienoel · 2026-04-29T14:58:08Z

Batched `SKIP LOCKED` Walkthrough

This document walks through the rebased change and identifies whether each step happens in the YSQL layer or on the tserver side.

Definitions:

YSQL: PostgreSQL/YSQL layer, including executor and PgGate client-side API.
tserver: tablet server / DocDB side.

1. Enable batching via GUC

Where: YSQL

Files:

src/postgres/src/backend/utils/misc/guc.c
src/postgres/src/include/utils/guc.h

Adds:

yb_skip_locked_batch_size

Default:

Meaning:

1 disables the optimization.
>1 allows YSQL executor to prefetch multiple candidate rows for SKIP LOCKED.

2. Detect eligible `SKIP LOCKED` query

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

In ExecLockRows, the batch path is used only when:

node->yb_are_row_marks_for_yb_rels &&
yb_skip_locked_batch_size > 1 &&
list_length(node->lr_arowMarks) == 1 &&
erm->waitPolicy == LockWaitSkip

So YSQL decides whether to use the optimization.

This means the query must be a YB relation with one row mark and SKIP LOCKED.

3. Prefetch candidate rows from the child plan

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

Function:

ExecLockRowsBatchSkipLocked(...)

YSQL pulls up to yb_skip_locked_batch_size rows from the child plan:

slot = ExecProcNode(outerPlan);

For each candidate row, YSQL extracts the ybctid:

datum = ExecGetJunkAttribute(slot, aerm->ctidAttNo, &isNull);

Then stores:

batch_ybctids[batch_count] = datumCopy(...);
batch_tuples[batch_count] = ExecCopySlotHeapTuple(slot);

At this point, YSQL has a local batch of candidate rows.

4. Send candidate `ybctid`s to PgGate

Where: YSQL

File:

src/postgres/src/backend/access/yb_access/yb_scan.c

Function:

YBCLockTupleBatch(...)

YSQL creates a new select statement:

YbcPgStatement ybc_stmt = YbNewSelect(...);

Then it adds each candidate ybctid:

YBCPgDmlAddBatchYbctidArg(ybc_stmt, data, len);

This is still YSQL-side code preparing the request.

5. PgGate appends batch arguments to the read request

Where: YSQL / PgGate client side

Files:

src/yb/yql/pggate/ybc_pggate.cc
src/yb/yql/pggate/pggate.cc
src/yb/yql/pggate/pg_dml_read.cc

Call chain:

YBCPgDmlAddBatchYbctidArg(...)

calls:

PgApiImpl::DmlAddBatchYbctidArg(...)

which calls:

PgDmlRead::AddBatchYbctidArg(...)

That appends each candidate to:

read_req_->add_batch_arguments();

So the request sent to the tserver contains:

batch_arguments = [ybctid0, ybctid1, ybctid2, ...]

This is PgGate building the tserver request, but it is on the YSQL side of the boundary.

6. Execute/fetch the YSQL statement, causing RPC to tserver

Where: starts in YSQL, crosses to tserver

File:

src/postgres/src/backend/access/yb_access/yb_scan.c

YSQL calls:

YBCPgExecSelect(ybc_stmt, &exec_params);
YBCPgDmlFetch(ybc_stmt, 0, NULL, NULL, NULL, &has_data);

The fetch is what forces the PgGate operation to perform the remote read RPC.

Boundary:

YSQL/PgGate  --->  tserver

7. tserver detects special batch `SKIP LOCKED` request

Where: tserver

File:

src/yb/tserver/read_query.cc

In ReadQuery::DoPerform, tserver checks:

has_row_mark &&
!serializable_isolation &&
req_->pgsql_batch_size() == 1 &&
pgsql_read.wait_policy() == WAIT_SKIP &&
pgsql_read.batch_arguments_size() > 1

If true, tserver chooses the new path:

TryLockBatchArg(0, ...);

So the tserver, not YSQL, decides how to process the batched lock request internally.

8. tserver tries to lock one batch argument at a time

Where: tserver

File:

src/yb/tserver/read_query.cc

Function:

ReadQuery::TryLockBatchArg(...)

For each candidate index, tserver builds a write operation that creates read-lock intents for only that candidate.

It calls:

tablet_ptr->CreateReadIntentForBatchArg(
    isolation_level,
    pgsql_read,
    batch_arg_index,
    &write_batch);

Then it submits:

peer->WriteAsync(std::move(query));

This is tserver-side async write/conflict resolution.

9. Tablet builds intents for exactly one candidate

Where: tserver

Files:

src/yb/tablet/tablet.cc
src/yb/tablet/tablet.h
src/yb/docdb/pgsql_operation.cc
src/yb/docdb/pgsql_operation.h

Call chain:

Tablet::CreateReadIntentForBatchArg(...)

calls:

docdb::GetIntentsForBatchArg(...)

That function picks one candidate:

const auto& batch_argument = request.batch_arguments(batch_arg_index);

and creates lock intents only for that candidate.

This is important: tserver does not lock all batch arguments at once.

10. tserver handles lock success or conflict

Where: tserver

File:

src/yb/tserver/read_query.cc

In the callback from WriteAsync:

If the lock succeeds

self->first_locked_batch_arg_index_ = batch_arg_index;
peer->Enqueue(self.get());

The tserver records the winning index and proceeds to read the row.

If the lock conflicts / transaction error occurs

TransactionError(status).value() != TransactionErrorCode::kNone

Then because this is SKIP LOCKED, tserver skips this candidate and tries the next one:

self->TryLockBatchArg(batch_arg_index + 1, ...);

If all candidates conflict

Eventually:

batch_arg_index >= total_batch_args

Then tserver sets:

first_locked_batch_arg_index_ = -1;

and proceeds to read phase with no winner.

11. tserver reads only the winning candidate

Where: tserver

File:

src/yb/tserver/read_query.cc

Function:

ReadQuery::DoReadImpl()

The original request still contains all candidates:

batch_arguments = [row0, row1, row2, row3]

But after locking, tserver creates a modified effective request.

If there is a winner:

modified_req->clear_batch_arguments();
*modified_req->add_batch_arguments() = batch_argument_for_winner;
effective_req = modified_req;

So the actual read phase reads only the winning row.

If no candidate was locked:

modified_req->clear_batch_arguments();
effective_req = modified_req;

So the read returns zero rows.

12. tserver populates response metadata

Where: tserver

File:

src/yb/tserver/read_query.cc

After reading, tserver sets:

result.response->set_batch_arg_count(pgsql_read_req.batch_arguments_size());

That tells PgGate:

All batch arguments were consumed/tried.

If there was a winner, tserver also sets:

result.response->set_first_locked_batch_arg_index(first_locked_batch_arg_index_);

This field was added in:

src/yb/common/pgsql_protocol.proto

as:

optional int32 first_locked_batch_arg_index = 22;

So the tserver response carries the winning candidate index back to YSQL.

Boundary:

tserver  --->  YSQL/PgGate

13. PgGate reads the winner index from response

Where: YSQL / PgGate client side

Files:

src/yb/yql/pggate/pg_doc_op.cc
src/yb/yql/pggate/pg_dml_read.cc
src/yb/yql/pggate/pggate.cc
src/yb/yql/pggate/ybc_pggate.cc

Call chain:

YBCPgDmlGetFirstLockedBatchArgIndex(...)

calls:

PgApiImpl::DmlGetFirstLockedBatchArgIndex(...)

then:

PgDmlRead::GetFirstLockedBatchArgIndex()

then:

PgDocReadOp::GetFirstLockedBatchArgIndex()

which reads:

resp->first_locked_batch_arg_index()

If absent, it returns:

-1

14. YSQL maps the winner index to a PostgreSQL lock result

Where: YSQL

File:

src/postgres/src/backend/access/yb_access/yb_scan.c

Back in:

YBCLockTupleBatch(...)

YSQL gets:

winner

Then:

if (winner >= 0 && winner < count)
{
    *locked_index = winner;
    res = TM_Ok;
}
else
{
    *locked_index = -1;
    res = TM_WouldBlock;
}

So YSQL converts tserver's response into PostgreSQL executor semantics:

winner found: TM_Ok
no winner: TM_WouldBlock

15. YSQL returns the winning tuple

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

Back in:

ExecLockRowsBatchSkipLocked(...)

If the batch lock returned TM_Ok, YSQL restores the winning tuple into the result slot:

ExecForceStoreHeapTuple(batch_tuples[locked_index], result_slot, true);

Then it returns that tuple to the upper executor nodes.

So if the query was:

SELECT * FROM jobs FOR UPDATE SKIP LOCKED LIMIT 1;

this is the point where the selected row goes back up the executor tree.

16. YSQL saves candidates after the winner as leftovers

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

Suppose the batch was:

[row0, row1, row2, row3]

and tserver locked:

row1

Then:

row0 was tried and skipped.
row1 is returned.
row2, row3 were prefetched but not tried.

YSQL stores candidates after the winner:

node->yb_batch_leftover_tuples
node->yb_batch_leftover_ybctids

These are used on later calls to ExecLockRows.

17. YSQL tries leftovers before scanning more rows

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

At the top of ExecLockRows, before fetching a new row from the child plan, YSQL checks:

if (node->yb_batch_leftover_count > node->yb_batch_leftover_idx)

If leftovers exist, it calls:

ExecLockRowsTryLeftover(...)

That tries each leftover using the existing single-row path:

YBCLockTuple(...)

The leftover entries themselves do not store a table id or relation key. They only store parallel
arrays of:

ybctid
tuple

The table is recovered from the same LockRowsState row mark:

ExecAuxRowMark *aerm = (ExecAuxRowMark *) linitial(node->lr_arowMarks);
ExecRowMark *erm = aerm->rowmark;

and the lock call uses:

YBCLockTuple(erm->relation, ybctid, erm->markType, LockWaitSkip, estate);

This is safe because the batch optimization is only enabled when:

list_length(node->lr_arowMarks) == 1

So every prefetched candidate and every leftover belongs to the single row-marked YB relation for
this LockRowsState. For multi-row-mark queries, such as joins with multiple locked tables, the
batch path is not used because a bare leftover ybctid would not be enough to identify which
relation to lock.

So after the first batch winner, later prefetched-but-untried rows are not lost.

18. YSQL cleans up leftovers

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

In:

ExecEndLockRows(...)

YSQL frees any unconsumed leftover tuples and ybctids.

Compact end-to-end sequence

Step	Layer	What happens
1	YSQL	GUC `yb_skip_locked_batch_size` controls batch size
2	YSQL	`ExecLockRows` detects eligible `FOR UPDATE SKIP LOCKED`
3	YSQL	Executor prefetches candidate rows and extracts ybctids
4	YSQL	`YBCLockTupleBatch` builds a select request
5	YSQL/PgGate	PgGate adds ybctids as `batch_arguments`
6	YSQL -> tserver	Fetch triggers RPC
7	tserver	`ReadQuery::DoPerform` detects batched `SKIP LOCKED`
8	tserver	`TryLockBatchArg` tries candidate 0
9	tserver	Tablet/DocDB creates intent for only that candidate
10	tserver	On conflict, try next candidate; on success, record winner
11	tserver	Read phase reads only winning candidate
12	tserver	Response includes `first_locked_batch_arg_index`
13	tserver -> YSQL	Response returns to PgGate
14	YSQL/PgGate	PgGate exposes winner index
15	YSQL	`YBCLockTupleBatch` maps winner to `TM_Ok` / `TM_WouldBlock`
16	YSQL	Executor returns winning tuple
17	YSQL	Executor saves later prefetched candidates as leftovers
18	YSQL	Future calls try leftovers before scanning more rows

CLAassistant · 2026-04-29T14:58:27Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

netlify · 2026-04-29T14:59:19Z

✅ Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`4b696d1`
🔍 Latest deploy log	https://app.netlify.com/projects/infallible-bardeen-164bc9/deploys/69f21c840d90900008bb7a04
😎 Deploy Preview	https://deploy-preview-31344--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Prototype batching lock intent for skip locked

4b696d1

emilienoel mentioned this pull request Apr 29, 2026

[YSQL] SKIP LOCKED performance under high contention #31347

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype batching lock intent for skip locked#31344

Prototype batching lock intent for skip locked#31344
emilienoel wants to merge 1 commit into
yugabyte:masterfrom
Shopify:prototype_batching_intent_SKIP_LOCKED

emilienoel commented Apr 29, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 29, 2026

Uh oh!

netlify Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

emilienoel commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Batched SKIP LOCKED Walkthrough

1. Enable batching via GUC

2. Detect eligible SKIP LOCKED query

3. Prefetch candidate rows from the child plan

4. Send candidate ybctids to PgGate

5. PgGate appends batch arguments to the read request

6. Execute/fetch the YSQL statement, causing RPC to tserver

7. tserver detects special batch SKIP LOCKED request

8. tserver tries to lock one batch argument at a time

9. Tablet builds intents for exactly one candidate

10. tserver handles lock success or conflict

If the lock succeeds

If the lock conflicts / transaction error occurs

If all candidates conflict

11. tserver reads only the winning candidate

12. tserver populates response metadata

13. PgGate reads the winner index from response

14. YSQL maps the winner index to a PostgreSQL lock result

15. YSQL returns the winning tuple

16. YSQL saves candidates after the winner as leftovers

17. YSQL tries leftovers before scanning more rows

18. YSQL cleans up leftovers

Compact end-to-end sequence

Uh oh!

CLAassistant commented Apr 29, 2026

Uh oh!

netlify Bot commented Apr 29, 2026

✅ Deploy Preview for infallible-bardeen-164bc9 ready!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emilienoel commented Apr 29, 2026 •

edited

Loading

Batched `SKIP LOCKED` Walkthrough

2. Detect eligible `SKIP LOCKED` query

4. Send candidate `ybctid`s to PgGate

7. tserver detects special batch `SKIP LOCKED` request